Dynamic text-to-speech response from a smart speaker

ABSTRACT

A method of operating a situationally aware speaker associated with a virtual personal assistant (VPA) service provider that comprises receiving an indication of at least one parameter of an environment proximate the situationally aware speaker, and delivering the response to the vocal query to the user formatted as speech through an audio output of the situationally aware speaker, at least one audio parameter of the response set based on the indication of the at least one parameter.

CROSS-REFERENCE TO RELATED APPLCIATIONS

This application claims priority under 35 U.S.C. § 120 as a continuationof U.S. patent application Ser. No. 15/607,101, titled “DYNAMICTEXT-TO-SPEECH RESPONSE FROM A SMART SPEAKER,” filed May 26, 2017, thedisclosure of which is incorporated herein in its entirety for allpurposes.

TECHNICAL FIELD

Aspects and implementations of the present disclosure are directedgenerally to customizing responses of a smart speaker to commands orqueries by a user based at least in part upon one or more parameters ofthe environment around the smart speaker.

BACKGROUND

Smart speakers having access to virtual personal assistant (VPA)services are devices that respond to user queries, which may be in theform of spoken queries, by searching for a response to the query of theuser, for example, using the internet and provide the response to theuser, often in the form of an audible response such as synthesizedspeech. Smart speakers having access to VPA services may also respond touser commands to play audio from a specified audio source, for example,an internet radio station, or to control a smart device, for example, toturn on or off a light or change a setting of another smart device thatthe smart speaker has access to, for example, via Wi-Fi signals eitherdirectly or through an internet router of the user. Queries or commandsare typically provided to a VPA through a smart speaker or other deviceby a user after the user presses a button or says a wake up word orphrase, for example, “Alexa” that indicates to the smart speaker orother device that the user is addressing the VPA. VPA enabled devicesare becoming more prevalent with various companies providing competingdevices, for example, the Echo™ device from Amazon, Google Home™ devicefrom Google, and various devices incorporating the Siri™ applicationfrom Apple. Current smart speakers are not situationally aware. Theylack the ability to, for example, detect parameters of the environmentaround them such as location of a person, a number or people around thesmart speaker, or ambient noise levels. Current smart speakers cannottailor VPA responses to user queries or commands based on environmentalparameters.

SUMMARY

In accordance with an aspect of the present disclosure, there isprovided a method of operating a situationally aware speaker. The methodcomprises receiving an indication of at least one parameter of anenvironment proximate the situationally aware speaker, receiving audioinformation at the situationally aware speaker from a virtual personalassistant, and modifying the audio information based on the indicationof the at least one parameter. Receiving the audio information maycomprise receiving an audio response to a query spoken to thesituationally aware speaker by a user. The method may further compriserendering the audio response through the situationally aware speaker.

In some implementations, the method comprises modifying the audioinformation by setting a volume of the response based on the indicationof the at least one parameter. The at least one parameter may be one ormore of a volume of the query, a volume of background noise sensed by anaudio sensor of the situationally aware speaker, or an identity of theuser.

Modifying the audio information may include one or more of adjusting avolume, a tone, an equalization, a tone, or a speed of rendering of theaudio information. The at least one parameter may include one or more ofa volume of a query provided to the situationally aware speaker, avolume of background noise, a frequency spectrum of background noise, anidentity of the user, a location of the user, a time of day, aphysiological parameter of a person proximate the situationally awarespeaker, a response by the user to a previous response provided by thesituationally aware speaker, or a state of activity of one or moredevices proximate the situationally aware speaker.

In some implementations, the method comprises setting a volume of theresponse based on a location of the user. The method may furthercomprise selecting a speaker through which to deliver the response basedon the location of the user or setting a volume of the response based ona distance of the user from the situationally aware speaker.

In some implementations, the method comprises setting a volume of theresponse based on one or more of a time of day, a physiologicalparameter of a person within hearing distance of the situationally awarespeaker, a response by the user to a previous response provided by thesituationally aware speaker, or a state of activity of one or moredevices proximate the situationally aware speaker.

In some implementations, the method comprises setting a tone of theresponse based on the indication of the at least one parameter. Themethod may include formatting the response as one of a simulatedwhisper, a simulated shout, or with low frequency components of theresponse removed. The tone of the response may be set based on one ormore of a volume of the vocal query, a volume of background noise sensedby the audio sensor, an identity of the user, a location of the user, atime of day, a physiological parameter of a person within hearingdistance of the situationally aware speaker, a frequency spectrum ofbackground noise sensed by the audio sensor, or a response by the userto a previous response provided by the situationally aware speaker.

In some implementations, the method comprises setting a speed ofsimulated speech of the response based on the indication of the at leastone parameter. The speed of simulated speech of the response may be setbased on one of a speed of speech of the vocal query or an identity ofthe user.

In accordance with another aspect, there is provided a method ofdynamically formatting a response of a virtual personal assistant (VPA)service provider to a query of a user. The method comprises receiving anindication of a vocal query received from a user through an audio sensorof a device having access to the VPA service provider, receiving aresponse to the vocal query, and delivering the response to the userformatted as speech through an audio output of the device, at least oneaudio parameter of the response set based on at least one parameter ofan environment proximate the device.

In some implementations, the method comprises formatting a volume of theresponse and/or a tone of the response based on the at least oneparameter. An audio parameter of the response may be formatted based onan identity of the user. An audio parameter of the response may beformatted based on a time of day.

In accordance with another aspect, there is provided a smart speaker.The smart speaker comprises a microphone, at least one speaker, and aprocessor. The processor is configured to recognize a spoken user queryreceived at the microphone, communicate the user query to a virtualpersonal assistant service provider, receive a response to the userquery from the virtual personal assistant service provider, format theresponse as speech, and render the response to a user through the atleast one speaker. At least one audio parameter of the response is setbased on at least one parameter of an environment proximate the smartspeaker.

The processor may be configured to set a volume of the response based onone or more of a volume of the spoken user query, a volume of backgroundnoise sensed by the microphone, an identity of the user, a location ofthe user, a time of day, a physiological parameter of the user, or aresponse by the user to a previous response provided by the smartspeaker. The processor may be configured to set a tone of the responsebased on one or more of a tone of the spoken user query, a frequencyspectrum of background noise sensed by the microphone, an identity ofthe user, a location of the user, or a time of day.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings are not intended to be drawn to scale. In thedrawings, each identical or nearly identical component that isillustrated in various figures is represented by a like numeral. Forpurposes of clarity, not every component may be labeled in everydrawing. In the drawings:

FIG. 1A is a simplified schematic view of an example of a smart speakerincluding functionality to access a VPA service provider;

FIG. 1B is a simplified schematic view of another example of a smartspeaker including functionality to access a VPA service provider;

FIG. 1C illustrates a user providing a query to a smart speakerincluding functionality to access a VPA service provider and the smartspeaker responding to the user;

FIG. 1D illustrates a system including a smart speaker in communicationwith a VPA service provider;

FIG. 2 illustrates communications between a smart speaker and variousdevices through a router;

FIG. 3 illustrates communications between a smart speaker and a deviceof a user;

FIG. 4 illustrates a vehicle equipped with functionality to access a VPAservice provider; and

FIG. 5 illustrates headphones equipped with functionality to access aVPA service provider.

DETAILED DESCRIPTION

Aspects and implementations disclosed herein are not limited to thedetails of construction and the arrangement of components set forth inthe following description or illustrated in the drawings. Aspects andimplementations disclosed herein are capable of being practiced or ofbeing carried out in various ways.

Aspects and implementations disclosed herein may be applicable to a widevariety of smart speakers having access to virtual personal assistant(VPA) services. Aspects and implementations of smart speakers disclosedherein include functionality that renders the smart speakerssituationally aware. Based on one or more measured or monitoredparameters, aspects and implementations of smart speakers may tailor aresponse provided by a VPA service to a user query or command (a VPAresponse) to be more appropriate given the one or more measured ormonitored parameters. Tailoring the response to the user query orcommand may include providing the same content, e.g., the same words inthe same sequence as would be provided in an untailored response, butwith one or more audio components such as volume, tone, speed, etc.,modified from a default volume, tone, speed, etc. Examples of measuredor monitored parameters that the smart speaker may use to tailor a VPAresponse may include any one or more of volume of a user query orcommand, intonation of a user query or command, speed of speech in auser query or command, ambient noise level, time of day, day of theweek, location of a user providing the query or command, identity of auser providing the query or command, or a response by a user to aresponse given by the VPA service through the smart speaker.

A simplified schematic diagram of a smart speaker is illustrated in FIG.1, generally at 100. The smart speaker 100 includes a body 105. On orwithin the body 105 are mounted one or more audio sensors, for example,microphones 110. In some implementations the provision of multiplemicrophones 110 may enable the smart speaker to triangulate thedirection of origination of an audio signal, for example, a query orcommand of a user or location of an audio source, for example, a user.In other implementations a single directional microphone 110 may providefor the smart speaker to determine a direction of origination of theaudio signal. The microphone or microphones 110 (hereinafter referred toin the singular for simplicity), along with other components of thesmart speaker 100 are electrically connected to circuitry 115 includingat least one processor disposed within the body 105 of the smartspeaker. Circuitry 115 is illustrated in FIG. 1 as a single block, butmay include multiple modules, sub-circuits, or processors for performingthe various functions of the smart speaker 100, for example, audiosignal processing circuitry, communications circuitry, memory, etc.

The circuitry 115 may include or be electrically coupled to one or moreantennae 120 (only one of which is illustrated in FIG. 1A). The antennaor antennae 120 (hereinafter referred to in the singular for simplicity)may be utilized by the smart speaker 100 to connect to a VPA service,the internet, or another source to search for information for a responseto the query of a user. The smart speaker 100 may utilize the antenna120 to connect to the internet via a cellular signal or via Wi-Fithrough an internet router.

The antenna 120 may also be used by the smart speaker 100 to communicatewith other devices. In a home or building equipped with smart devices,for example, a smart thermostat, lights, or appliances that may becontrolled via Wi-Fi, etc., the smart speaker 100 may utilize theantenna 120 to relay commands from a user to the smart devices. Forexample, a user may provide an audio command to the smart speaker 100 toturn on the lights in a room of a house, and the smart speaker 100 mayinterpret the command and send the appropriate signal via Wi-Fi (in someimplementations through a Wi-Fi router) to a controller of the lights inthe room to turn the lights on. In other implementations, the smartspeaker 100 may utilize the antenna 120 to locate or communicate withdevices, for example, cellular telephones or other mobile computingdevices to gather information or send commands to the devices. Forexample, in some implementations, the smart speaker 100 may utilize theantenna 120 to identify the IP address of a cellular telephone or othercomputing device in the proximity of the smart speaker 100 and mayidentify a user associated with the cellular telephone or othercomputing device. The smart speaker 100 may use a single antenna 120 tocommunicate via Wi-Fi, cellular, Bluetooth or other communicationprotocols, or may include dedicated antennae 120 for differentcommunication protocols.

The smart speaker 100 further includes a power supply 125. The powersupply 125 may receive power from an electrical outlet through a plug130 and/or may include one or more batteries or other power storagedevices.

At least one audio output, for example, speaker 135 is included in thesmart speaker 100 for outputting audio responses to a query by a userand/or to provide alerts or verifications of receipt or completion ofcommands or to provide information regarding the condition or settingsof the smart speaker 100. In some implementations, as illustrated inFIG. 1B, multiple speakers 135 may be provided. The multiple speakers135 may be utilized to control a direction of audio output to a user.

A user interface 140, which may include manually activated controls, andin some implementations, a display, may be disposed on the body of thesmart speaker 100 to allow for a user to change settings (e.g., poweron/off, volume, etc.) of the smart speaker 100 or to program the smartspeaker 100. In other implementations the setting and/or programming ofthe smart speaker 100 may be adjusted wirelessly though antenna 120, forexample, using an application on a user's cell phone or other computingdevice. Additionally or alternatively the setting and/or programming ofthe smart speaker 100 may be adjusted using a cable connected to theuser interface 140 and coupled to an external device, for example, auser's cell phone or other computing device.

The smart speaker 100 may be implemented in various form factors. Insome implementations, the smart speaker 100 is designed to be placed ina fixed location of a desk or countertop. In other implementations, thesmart speaker 100 is included in a vehicle. In further implementations,the smart speaker 100 is implemented in headphones or in software in amobile computing device, for example, a cell phone. Implementations ofthe smart speaker 100 described above with reference to FIG. 1A may bemodified as appropriate for a particular implementation. For example,when included in a vehicle, headphones, or cell phone the smart speaker100 may not include a separate power source 125 or plug 130 and thevarious components of the smart speaker 100 may be distributedthroughout the vehicle, headphones, or cell phone or implemented insoftware or in hardware modules that are shared with other systems ofthe vehicle, headphones, or cell phone.

FIG. 1C illustrates one example of user interaction with a smart speaker100. As shown, the user 195 may speak an audio query 100A to the smartspeaker 100. The smart speaker 100 receives the audio query 100A througha microphone and performs a speech-to-text transformation of the audioquery 100A. Alternatively, the smart speaker 100 records the audio queryand sends it to another system, for example, a cloud-based VPA serviceprovider 1000, as an audio file. The smart speaker 100 requests that theVPA service provider 1000 searches in a database, for example, theinternet or cloud for information to respond to the user's query 100A.The smart speaker 100 may access the VPA service provider or othersource of information via an internet router 150. The smart speaker 100retrieves the information needed to respond to the user's query 100Afrom the VPA service provider in text form, performs a text-to-speechtransformation on the received information and outputs an audio response100B to the user 195. Alternatively, the text-to-speech transformationof the response may be performed by the VPA service provider and theresponse may be sent to the smart speaker 100 from the VPA serviceprovider as an audio file.

In other implementations, VPA functionality, e.g., sending a request forinformation to a VPA service provider or other source of information andreceiving a response to the request for information from the VPA serviceprovider or other source of information may be performed in a deviceseparate from a device that receives a user query or command or rendersthe response to the user query or command. As illustrated in FIG. 1D, auser 195 may provide a spoken query 100A to a smart speaker 101. Theuser 195 may speak a wake word to the smart speaker 101 prior toproviding the spoken query 100A so the smart speaker 101 will interpretthe spoken query 100A as one to which the user 195 desires a response.The smart speaker 101 may include substantially the same components asthe smart speaker 100 described with reference to FIGS. 1A and 1B, butmay lack functionality to send queries to a VPA service provider orother data source and receive responses to queries from the VPA serviceprovider or other data source. In one non-limiting example, the smartspeaker 101 is one of the SoundTouch® audio playback devices availablefrom the Bose Corporation or a similar streaming audio player. The smartspeaker 101 may relay the spoken query 100A, optionally after recordingthe spoken query 100A, to a smart speaker 100 having the ability torequest and receive a response to the user query 100A from a VPA serviceprovider or other source of information as described above, for example,to service provider or other source of information in the cloud 1000.The smart speaker 100 may receive a response to the user query from theVPA service provider or other source of information and communicate theresponse to the smart speaker 101 for rendering. The smart speaker 101may render the response as an audio response 100B to the user 195 afterapplying appropriate signal conditioning to the response to vary one ormore audio parameters of the response based on one or more environmentalvariables of the environment about the smart speaker 101.

Communications between the smart speaker 101, smart speaker 100, and VPAservice provider 1000 may be through a router 150 as illustrated in FIG.1D or may include direct communication (wired or wireless) between thesmart speaker 101 and smart speaker 100.

It should be understood that reference to a smart speaker 100 hereinincludes systems in which a single component receives spoken userqueries and provides audio responses to a user as well as requests andreceives responses to the queries from an external source, as well as tosystems as illustrated in FIG. 1D in which a first device receives userqueries and renders responses to a user and a second device requests andreceives responses to the user queries and communicates the responses tothe first device for rendering.

As discussed above, the smart speaker 100 may be situationally aware andmay tailor responses to user queries or commands based on one or moremeasured or monitored parameters. One parameter may be the volume of thevoice of a user giving an oral query or command to the smart speaker100. In some implementations, the smart speaker 100 may include a volumecontrol that a user may set to a default level. In some instances thesmart speaker 100 may respond to a query or command at a volumedifferent than a set or default volume. For example, if a user isgetting up early and other members of the user's household are stillasleep, a user may verbally query the smart speaker 100 for informationsuch as the day's weather forecast at a low volume. The smart speaker100 may receive the query through a microphone 110 and determine if thevolume of the user's voice is higher or lower than typical wheninteracting with the smart speaker 100 or higher or lower than apre-determined threshold. If the query was provided by the user speakingsoftly, the smart speaker 100 may provide an audible response to thequery at a volume lower than the default or set volume. In anotherexample, the smart speaker 100 may provide a response that is louderthan usual, for example, to overcome background noise or if the user isfar from the smart speaker 100. The user may provide a query or commandto the smart speaker 100 in an elevated volume and the smart speaker 100will respond to the query or command at a corresponding volume higherthan the default or set volume. In some implementations the smartspeaker 100 may have a set number of volume adjustment levels, forexample, 25%, 50%, 100%, 125%, 150%, and 200% of the default or setvolume. In other implementations the smart speaker 100 may respond to auser's query at a volume within a continuum of volumes based on thevolume of the user's query.

In other implementations, the smart speaker 100 may respond to queriesor commands in a tone that is based on the volume or tone of a commandor query provided by a user to the smart speaker 100. For example, ifthe user provides a command or query to the smart speaker 100 at a lowvolume or in a whisper, the smart speaker may provide an audibleresponse to the command or query that sounds like a whisper. Conversely,if the user shouted a command or query the smart speaker 100, the smartspeaker 100 may respond with an audible response that sounds like ashout. In some implementations, if the user provides the query orcommand in a particular tone, for example, if a child provides the queryor command in a tone typical of that of a voice of child, the smartspeaker 100 may respond with a response that sounds like the voice of achild. Similarly, the tone of the response provided by the smart speaker100 may be provided as a female voice or a male voice based on whetherthe user query or command is provided in a female or a male voice.Signal processing circuitry in the circuitry 115 of the smart speaker100 may apply audio filtering/signal processing to the response toproduce the response in the alternate tone.

The smart speaker 100 may additionally or alternatively modify the toneof a response to a command or query from a default tone by applying abandpass filter to the response. Low frequencies of sound tend topropagate further through buildings than do higher frequencies.Accordingly, if the user provides a command or query to the smartspeaker 100 at a low volume or if the smart speaker 100 receives someother indication that it should provide an audio response that does notpropagate far, the smart speaker 100 may apply a high pass filter to theaudio response, removing low frequency wavelengths from the response.

In some implementations, the smart speaker 100 may include functionalityto determine the rapidity of speech of a user providing a command orquery to the smart speaker 100 and may adjust the rapidity of renditionof an audio response to the command or query based on the rapidity ofspeech of the user. If the user provides the command or query byspeaking slowly, the smart speaker 100 may provide an audio response tothe command or query slowly. Such functionality may be useful if theuser is not fluent in English or whatever other language the smartspeaker 100 is programmed to respond in and the user wishes to receive aresponse from the smart speaker more slowly than a default or setresponse speed to assist the user in understanding the response.Conversely, if the user provides the command or query by speakingrapidly, the smart speaker 100 may provide an audio response to thecommand or query in a manner replicating rapid speech. Suchfunctionality may be useful if the user is in a hurry and the userwishes to receive a response from the smart speaker more quickly than adefault or set response speed.

In other implementations, the smart speaker 100 may monitor or measurethe volume level and/or frequency spectrum of background or ambientnoise when receiving a wake up phrase and prior to or during receiving acommand or query from a user. The smart speaker 100 may take the volumelevel and/or frequency spectrum of background or ambient noise intoaccount when formatting an audio response to a user command or query. Ifthere is a large amount or high volume of background noise from, forexample, multiple people talking or from a television or radio in thevicinity of the smart speaker 100, or due to the smart speaker 100 beingin an outdoor environment, the smart speaker 100 may output responses ata higher volume than a default or set volume. In some implementationsthe smart speaker 100 may have a set number of volume adjustment levels,for example, 25%, 50%, 100%, 125%, 150%, and 200% of the default or setvolume. In other implementations the smart speaker 100 may respond to auser's query at a volume within a continuum of volumes based on thevolume of the background or ambient noise. Additionally oralternatively, the smart speaker 100 may analyze the frequency spectrumof the background or ambient noise. The smart speaker 100 may modify thevolume of only certain frequencies of an audio response, for example,frequencies at which the background or ambient noise is louder than atother frequencies. In other implementations, the smart speaker 100 mayincrease the volume of the response at frequencies at which thebackground or ambient noise is softer or less loud than at otherfrequencies so that the audio response may be more readily distinguishedfrom the background or ambient noise by a user.

The smart speaker 100 may dynamically adjust the volume and/or tone ofresponses to commands or queries based on factors other than parametersof the voice of a user that provides a command or query to the smartspeaker 100. In some implementations, the smart speaker 100 may includea clock (e.g., within circuitry 115, FIG. 1) or receives indications ofthe time and may be programmed to provide audio responses to commands orqueries at a reduced volume below the default or set volume duringnighttime hours when members of a household may be expected to besleeping. The reduced volume may be a volume that is set by a user ofthe smart speaker 100. Additionally or alternatively, the smart speaker100 may be programmed to provide audio responses to commands or queriesin a tone resembling a whisper or with lower frequencies of the audioresponse suppressed (e.g., with a high pass filter applied to the audioresponse) during nighttime hours or other time periods set by a user.

In addition, the smart speaker 100 may query other smart devices that itmay be in communication with, for example, lights 160 or entertainmentsystem(s) 170 (e.g., other smart speakers, televisions, radios, etc.),or even physiological monitors 180 of residents of the household, eitherdirectly or via a Wi-Fi router 150 (See FIG. 2). If the lights 160 orentertainment system(s) 170 are active and/or if the physiologicalmonitors 180 indicate that the residents of the household are awake,this may provide an indication that people in the household are awakeand a reduced volume response to queries or commands may not bewarranted and the smart speaker 100 may respond to queries or commandsat a default or set volume. If the lights 160 or entertainment system(s)are inactive and/or if the physiological monitors indicate that theresidents of the household are asleep this may provide an indicationthat people in the household are asleep and a reduced volume response toqueries or commands may be warranted. In other implementations, aphysiological monitor 180 of a user may provide information regardingthe heart rate or respiration rate of a user. An elevated heart rate orrespiration rate of the user may be indicative of the user being in anexcited state or in a state of exercising or having just finishedexercising. In response to the smart speaker 100 receiving an indicationfrom the physiological monitor 180 of the user of an elevated heart rateor respiration rate of the user, the smart speaker 100 may provide audioresponses to queries or commands of the user at a volume elevated abovea default or set volume.

The smart speaker 100 may also include functionality to communicate withone or more other devices 190, for example, a cell phone, a smart watch,etc. of a user 195. (See FIG. 3.) The smart speaker 100 may query thedevice 190 for information from which it may derive an identity of theuser. For example, the smart speaker 100 may request the IP address ofthe device 190 and attempt to match the IP address of the device 190 toa user in a lookup list in a memory of the circuitry 115 of the smartspeaker 100 or in the cloud 1000 (FIG. 1B). Once the VPA determines anidentity of the user 195, it may tailor responses to queries or commandsfrom the user in a manner appropriate for the user. For example, if thesmart speaker 100 determines that the user is an elderly person who mayhave some hearing loss, the smart speaker 100 may respond to queries orcommands from the user with an audio response provided at an increasedvolume relative to a default or set volume. In some implementations,different users may program the smart speaker 100 with differentpreferences for responses to queries or commands, for example, pitch ofthe audio response, whether the audio response should be in a female ormale voice, volume of the audio response, speed of the audio response,etc.

In some implementations, the smart speaker 100 may listen for anindication that a user did not understand an audio response provided bythe smart speaker 100 to a command or query. If such an indication isdetected the smart speaker 100 may repeat the audio response,potentially at a slower “speaking” rate and/or a higher volume. Forexample, if the smart speaker 100 provides an audio response to acommand or query of a user at a first speaking rate and a first volume,if the user responds with a vocal phrase indicative that the user didnot understand the response, for example, by saying “What?,” “What'sthat?,” “Huh?,” “I don't understand,” “please repeat,” etc., the smartspeaker 100 may repeat the response at a slower speaking rate and/orhigher volume.

Smart speakers 100 equipped with multiple microphones 110 or with adirectional microphone may detect the direction from which a usercommand or query originated and/or a location of a user providing thecommand or query. The smart speaker 100 may use this direction orlocation information to direct a response toward the user that gave thecommand or query. For example, in smart speakers 100 including more thanone speaker 135 (see, e.g., FIG. 1B), the relative volumes of thedifferent speakers may be modulated to direct the majority of the soundenergy in the response in the direction of the user that provided thecommand or query. In some implementations, a smart speaker 100 may be ina master-servant relationship with other smart speakers 100 or may havethe capability of controlling a separate smart speaker or audio device.In such implementations, the smart speaker 100 that received the resultsof a query from a user may direct a smart speaker 100 or separate smartspeaker or audio device that is closer to the user than the smartspeaker 100 that received the results of a query to provide the audibleresponse to the query to the user.

In one particular implementation a smart speaker 100 havingfunctionality to access a VPA service may be implemented in theelectronics of a vehicle, for example, a car or SUV. The smart speaker100 may include dedicated circuitry 115, or may be implemented assoftware in a computer of the vehicle 200. As disclosed in internationalapplications PCT/US2017/021521 and PCT/US2017/021625, assigned to thesame assignee as the present application, the vehicle 200 may includeshared speakers 205 and dedicated speakers 210 for each passenger in thevehicle. The dedicated speakers 210 may be built into the headrest orother portion of the seats of the vehicle 200. (See FIG. 4.) The vehicle200 may include one or more microphones 215 in a portion of the vehicle200, for example, in the dashboard, or in the headrests of the seats ofthe vehicle. The microphones 215 may be utilized to receive a query orcommand to the smart speaker implemented in the electronics of a vehicleand may be used by the smart speaker to determine which user/seat thequery or command originated from. An audio response to the query orcommand may be provided by the smart speaker to the dedicated speakersassociated with the user/seat the query or command originated from. Insome implementations cross talk cancellation or noise cancellingtechnologies may be utilized by the dedicated speakers associated withthe user/seats other than that which the query or command originatedfrom to at least partially lower the perceived volume of the audioresponse at the user/seats other than that which the query or commandoriginated from.

A smart speaker having functionality to access a VPA service may, insome implementations, be included in headphones. As illustrated in FIG.5, headphones, indicated generally at 300, may include a pair of earcups 310 coupled to one another by a headband 320. Each of the ear cups210 include at least one speaker (not shown for the sake of clarity) todeliver audio, for example, music to the ears of a user. VPAfunctionality may be provided in the headphones 300 by incorporating VPAaccess circuitry 115 and one or more microphones 110 into the headphones300, for example, into one or more of the ear cups 310. The VPA accesscircuitry 115 may include one or more antennae 120 to enablecommunication with the internet or other devices as described withreference to the various implementations of the smart speaker 100 above.In some implementations, the headphones 300 are wireless headphones andthe antenna 120 may be shared between the VPA access circuitry 115 andcircuitry of the headphones 300 used to provide wireless connectivity.In some implementations, the headphones 300 may also include a locationdetermination system, for example, a GPS receiver 330 and associatedcircuitry and/or a physiological monitor 180 (e.g., a heartratemonitor). It should be understood that the various componentsillustrated in FIG. 5, for example, VPA access circuitry 115, antenna120, GPS receiver 330, microphones 110, or physiological monitor 180 maybe located in different positions with in the headphones 300, forexample, at least partially in the headband 320, or may be at leastpartially included in a device to which the headphones 300 wirelesslyconnect, for example, a portable music player or cell phone. Electricalconnections between components are omitted from FIG. 5 for clarity.

In use, a user may provide a vocal query or command to the headphones300 that the VPA functionality provides a response to. The response maybe provided through the speakers in the ear cups 310, optionally whilereducing the volume or muting any music or other audio content a user islistening to through the headphones 300. The volume or rapidity ofspeech of the response may be modified based on the volume or rapidityof speech of the user's query or command as discussed above withreference to the various implementations of the smart speaker 100. Theheadphones may also alter the volume or speed of an audio response to auser's query or command based on other factors such as heart rate of theuser, identity of the user, etc., as discussed above with reference tothe various implementations of the smart speaker 100. The headphones 300may be capable of determining a location of a user wearing theheadphones 300 using the GPS receiver 330 and may adjust the volume ofan audio response to a user's query or command based on the location ofthe user, for example, providing louder responses if the user is in anurban environment, and softer responses if the user is in a rurallocation.

In any of the above implementations, the user may cause the smartspeaker 100 to respond in a manner that takes into account any of theparameters discussed above by providing a modified of alternate wake upphrase to the smart speaker 100 prior to providing the command or query.For example, if the user wishes the smart speaker to take the user'svoice volume into account to adjust the volume of a response, the usermay say the wake up phrase “Volume, Siri” instead of “Hey, Siri.”Alternatively, the smart speaker 100 may include a switch or may respondto a spoken command to enable or disable the modification or formattingof responses based on the one or more monitored or measured parameters.

Prophetic Example 1:

A smart speaker located in the kitchen of a household of a user has avolume selector set to a first level. The user wakes up in the middle ofthe night while other members of the household are asleep and goes tothe kitchen for a midnight snack. The user speaks a wake up word to thesmart speaker in a whisper and asks the smart speaker what the weatherforecast for the following day is. The smart speaker compares the volumeof the user request regarding the weather forecast against a pre-setvolume threshold and determines that the user is speaking at a reducedvolume compared to the pre-set volume threshold. The smart speaker sendsthe user request to a VPA enabled device through a Wi-Fi network in thehousehold of the user. The VPA queries a VPA service provider in thecloud for the weather report for the following day and receives theweather report from the VPA service provider. The VPA communicates theweather report to the smart speaker. The smart speaker renders theweather report to the user at a reduced volume as compared to a volumethe smart speaker would render audio based on the set volume level.

Prophetic Example 2:

An elderly user speaks a wake up word and requests information regardingthe readiness of a prescription to a smart speaker. The smart speakerdetects that a mobile phone has come into proximity of the smartspeaker. The smart speaker queries the mobile phone for identificationinformation, for example, an IP address, and receives the identificationinformation from the mobile phone. The smart speaker conveys theinformation request and identification information from the mobile phoneto a VPA enabled device through a Wi-Fi network in the household of theuser. The VPA enabled device queries a VPA service provider in the cloudfor information regarding the readiness of the prescription and receivesa response that the prescription is ready for pickup. The VPA serviceprovider also accesses an account associated with the VPA and searchesfor information correlating the identification information from themobile phone with a particular user and determines that the mobile phonebelongs to the elderly user. The VPA communicates the informationregarding the readiness of the prescription and the identity of the userto the smart speaker. The smart speaker renders the informationregarding the readiness of the prescription to the user at a presetvolume associated with the elderly user in a memory of the smart speakerthat is increased as compared to a default volume level and a presetspeed associated with the elderly user in the memory of the smartspeaker that is reduced as compared to a default response speech speed.

Having thus described several aspects of at least one implementation, itis to be appreciated various alterations, modifications, andimprovements will readily occur to those skilled in the art. Suchalterations, modifications, and improvements are intended to be part ofthis disclosure, and are intended to be within the spirit and scope ofthe disclosure. The acts of methods disclosed herein may be performed inalternate orders than illustrated, and one or more acts may be omitted,substituted, or added. One or more features of any one example disclosedherein may be combined with or substituted for one or more features ofany other example disclosed. Accordingly, the foregoing description anddrawings are by way of example only.

The phraseology and terminology used herein is for the purpose ofdescription and should not be regarded as limiting. As used herein, theterm “plurality” refers to two or more items or components. As usedherein, dimensions which are described as being “substantially similar”should be considered to be within about 25% of one another. The terms“comprising,” “including,” “carrying,” “having,” “containing,” and“involving,” whether in the written description or the claims and thelike, are open-ended terms, i.e., to mean “including but not limitedto.” Thus, the use of such terms is meant to encompass the items listedthereafter, and equivalents thereof, as well as additional items. Onlythe transitional phrases “consisting of” and “consisting essentiallyof,” are closed or semi-closed transitional phrases, respectively, withrespect to the claims. Use of ordinal terms such as “first,” “second,”“third,” and the like in the claims to modify a claim element does notby itself connote any priority, precedence, or order of one claimelement over another or the temporal order in which acts of a method areperformed, but are used merely as labels to distinguish one claimelement having a certain name from another element having a same name(but for use of the ordinal term) to distinguish the claim elements.

What is claimed is:
 1. A method of operating a situationally awarespeaker, the method comprising: receiving an indication of at least oneparameter of an environment proximate the situationally aware speaker;receiving audio information at the situationally aware speaker from avirtual personal assistant, wherein receiving the audio informationcomprises receiving an audio response to a query spoken to thesituationally aware speaker by a user; and modifying the audioinformation based on the indication of the at least one parameter,wherein modifying the audio information comprises varying a volume ofthe audio response based on at least one of a volume of the query, anidentity of the user, or a time of day.
 2. The method of claim 1,wherein modifying the audio information includes one or more ofadjusting a volume, a tone, an equalization, a tone, or a speed ofrendering of the audio information.
 3. The method of claim 1, whereinthe at least one parameter includes one or more of a volume of a queryprovided to the situationally aware speaker, a volume of backgroundnoise, a frequency spectrum of background noise, an identity of theuser, a location of the user, a time of day, a physiological parameterof a person proximate the situationally aware speaker, a response by theuser to a previous response provided by the situationally aware speaker,or a state of activity of one or more devices proximate thesituationally aware speaker.
 4. The method of claim 1, furthercomprising rendering the audio response through the situationally awarespeaker.
 5. The method of claim 1, wherein modifying the audioinformation comprises varying the volume of the response based on thevolume of the query.
 6. The method of claim 1, further comprisingsetting the volume of the response based on a volume of background noisesensed by an audio sensor of the situationally aware speaker.
 7. Themethod of claim 1, further comprising setting the volume of the responsebased on a physiological parameter of a person within hearing distanceof the situationally aware speaker.
 8. The method of claim 1, furthercomprising setting the volume of the response based on a state ofactivity of one or more devices proximate the situationally awarespeaker.
 9. The method of claim 1, further comprising setting a tone ofthe response based on the indication of the at least one parameter. 10.The method of claim 9, further comprising formatting the response as oneof a simulated whisper, a simulated shout, or with low frequencycomponents of the response removed.
 11. The method of claim 9, furthercomprising setting a tone of the response based on one or more of avolume of the vocal query, a volume of background noise sensed by theaudio sensor, an identity of the user, a location of the user, a time ofday, a physiological parameter of a person within hearing distance ofthe situationally aware speaker, a frequency spectrum of backgroundnoise sensed by the audio sensor, or a response by the user to aprevious response provided by the situationally aware speaker.
 12. Themethod of claim 1, further comprising setting a speed of simulatedspeech of the response based on one of a speed of speech of the vocalquery or an identity of the user.
 13. A method of dynamically formattinga response of a virtual personal assistant (VPA) service provider to aquery of a user, the method comprising: receiving an indication of avocal query received from a user through an audio sensor of a devicehaving access to the VPA service provider; receiving a response to thevocal query; and delivering the response to the user formatted as speechthrough an audio output of the device, at least one audio parameter ofthe response set based on at least one parameter of an environmentproximate the device, wherein a volume of the response is varied basedon at least one of a volume of the query, an identity of the user, or atime of day.
 14. The method of claim 13, further comprising formattingthe volume of the response based on the at least one parameter.
 15. Themethod of claim 13, further comprising formatting a tone of the responsebased on the at least one parameter.
 16. The method of claim 13, furthercomprising formatting an audio parameter of the response based on anidentity of the user.
 17. The method of claim 13, further comprisingformatting an audio parameter of the response based on a time of day.18. A smart speaker comprising: a microphone; at least one speaker; anda processor configured to: recognize a spoken user query received at themicrophone; communicate the user query to a virtual personal assistantservice provider; receive a response to the user query from the virtualpersonal assistant service provider; format the response as speech, atleast one audio parameter of the response set based on at least oneparameter of an environment proximate the smart speaker; vary a volumeof the response based on at least one of a volume of the user query, anidentity of the user, or a time of day; and render the response to auser through the at least one speaker.
 19. The smart speaker of claim18, wherein the processor is configured to set the volume of theresponse based on one or more of a volume of background noise sensed bythe microphone, a location of the user, a physiological parameter of theuser, or a response by the user to a previous response provided by thesmart speaker.
 20. The smart speaker of claim 18, wherein the processoris configured to set a tone of the response based on one or more of atone of the spoken user query, a frequency spectrum of background noisesensed by the microphone, an identity of the user, a location of theuser, or a time of day.